37 research outputs found
Understanding the Overfitting of the Episodic Meta-training
Despite the success of two-stage few-shot classification methods, in the
episodic meta-training stage, the model suffers severe overfitting. We
hypothesize that it is caused by over-discrimination, i.e., the model learns to
over-rely on the superficial features that fit for base class discrimination
while suppressing the novel class generalization. To penalize
over-discrimination, we introduce knowledge distillation techniques to keep
novel generalization knowledge from the teacher model during training.
Specifically, we select the teacher model as the one with the best validation
accuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL)
divergence between the output distribution of the linear classifier of the
teacher model and that of the student model. This simple approach outperforms
the standard meta-training process. We further propose the Nearest Neighbor
Symmetric Kullback-Leibler (NNSKL) divergence for meta-training to push the
limits of knowledge distillation techniques. NNSKL takes few-shot tasks as
input and penalizes the output of the nearest neighbor classifier, which
possesses an impact on the relationships between query embedding and support
centers. By combining SKL and NNSKL in meta-training, the model achieves even
better performance and surpasses state-of-the-art results on several
benchmarks